智能论文笔记

FEMa-FS: Finite Element Machines for Feature Selection

Lucas Biaggi , João P. Papa , Kelton A. P Costa , Danillo R. Pereira , Leandro A. Passos

分类：机器学习 | 人工智能

2022-12-05

Identifying anomalies has become one of the primary strategies towards security and protection procedures in computer networks. In this context, machine learning-based methods emerge as an elegant solution to identify such scenarios and learn irrelevant information so that a reduction in the identification time and possible gain in accuracy can be obtained. This paper proposes a novel feature selection approach called Finite Element Machines for Feature Selection (FEMa-FS), which uses the framework of finite elements to identify the most relevant information from a given dataset. Although FEMa-FS can be applied to any application domain, it has been evaluated in the context of anomaly detection in computer networks. The outcomes over two datasets showed promising results.

translated by 谷歌翻译

Active learning using adaptable task-based prioritisation

Shaheer U. Saeed , João Ramalhinho , Mark Pinnock , Ziyi Shen , Yunguan Fu , Nina Montaña-Brown , Ester Bonmati , Dean C. Barratt , Stephen P. Pereira , Brian Davidson

分类：计算机视觉

2022-12-03

Supervised machine learning-based medical image computing applications necessitate expert label curation, while unlabelled image data might be relatively abundant. Active learning methods aim to prioritise a subset of available image data for expert annotation, for label-efficient model training. We develop a controller neural network that measures priority of images in a sequence of batches, as in batch-mode active learning, for multi-class segmentation tasks. The controller is optimised by rewarding positive task-specific performance gain, within a Markov decision process (MDP) environment that also optimises the task predictor. In this work, the task predictor is a segmentation network. A meta-reinforcement learning algorithm is proposed with multiple MDPs, such that the pre-trained controller can be adapted to a new MDP that contains data from different institutes and/or requires segmentation of different organs or structures within the abdomen. We present experimental results using multiple CT datasets from more than one thousand patients, with segmentation tasks of nine different abdominal organs, to demonstrate the efficacy of the learnt prioritisation controller function and its cross-institute and cross-organ adaptability. We show that the proposed adaptable prioritisation metric yields converging segmentation accuracy for the novel class of kidney, unseen in training, using between approximately 40\% to 60\% of labels otherwise required with other heuristic or random prioritisation metrics. For clinical datasets of limited size, the proposed adaptable prioritisation offers a performance improvement of 22.6\% and 10.2\% in Dice score, for tasks of kidney and liver vessel segmentation, respectively, compared to random prioritisation and alternative active sampling strategies.

translated by 谷歌翻译

ComplexWoundDB: A Database for Automatic Complex Wound Tissue Categorization

Talita A. Pereira , Regina C. Popim , Leandro A. Passos , Danillo R. Pereira , Clayton R. Pereira , João P. Papa

分类：计算机视觉 | 机器学习

2022-09-26

复杂的伤口通常会面临部分或完全损失皮肤厚度，从而通过次要意图愈合。它们可以是急性或慢性的，可以发现感染，缺血和组织坏死以及与全身性疾病的关联。全球研究机构报告了无数案件，最终涉及严重的公共卫生问题，因为它们涉及人力资源（例如医师和医疗保健专业人员），并对生活质量产生负面影响。本文提出了一个新的数据库，用于自动将复杂伤口自动分类为五个类别，即非缠绕区域，肉芽，纤维蛋白样组织和干性坏死，血肿。这些图像包括由压力，血管溃疡，糖尿病，燃烧和手术干预后的并发症引起的复杂伤口的不同情况。该数据集（称为ComplexWoundDB）是独一无二的，因为它可以从野外获得的27美元图像中的像素级分类，即在患者的房屋中收集图像，并由四名卫生专业人员标记。用不同的机器学习技术进行的进一步实验证明了解决计算机辅助复杂伤口组织分类问题的挑战。手稿阐明了该地区未来的方向，在文献中广泛使用的其他数据库中进行了详细比较。

translated by 谷歌翻译

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Marília Costa Rosendo Silva , Felipe Alves Siqueira , João Pedro Mantovani Tarrega , João Vitor Pataca Beinotti , Augusto Sousa Nunes , Miguel de Mattos Gardini , Vinícius Adolfo Pereira da Silva , Nádia Félix Felipe da Silva , André Carlos Ponce de Leon Ferreira de Carvalho

分类：机器学习 | 自然语言处理 | (统计)机器学习

2022-08-02

使用机器学习算法从未标记的文本中提取知识可能很复杂。文档分类和信息检索是两个应用程序，可以从无监督的学习（例如文本聚类和主题建模）中受益，包括探索性数据分析。但是，无监督的学习范式提出了可重复性问题。初始化可能会导致可变性，具体取决于机器学习算法。此外，关于群集几何形状，扭曲可能会产生误导。在原因中，异常值和异常的存在可能是决定因素。尽管初始化和异常问题与文本群集和主题建模相关，但作者并未找到对它们的深入分析。这项调查提供了这些亚地区的系统文献综述（2011-2022），并提出了共同的术语，因为类似的程序具有不同的术语。作者描述了研究机会，趋势和开放问题。附录总结了与审查的作品直接或间接相关的文本矢量化，分解和聚类算法的理论背景。

translated by 谷歌翻译

A Benchmark dataset for predictive maintenance

Bruno Veloso , João Gama , Rita P. Ribeiro , Pedro M. Pereira

分类：机器学习 | 人工智能

2022-07-12

该论文描述了铁路数据集，这是葡萄牙波尔图市的城市地铁公共交通服务的预测维护项目的结果。数据是在2020年至2022年之间收集的，旨在开发用于在线异常检测和故障预测的机器学习方法。通过捕获几个类似的传感器信号（压力，温度，电流消耗），数字信号（控制信号，离散信号）和GPS信息（纬度，经度和速度），我们提供了一个框架，可以轻松使用和开发用于该框架新的机器学习方法。我们认为该数据集包含一些有趣的特征，并且可以成为预测维护模型的良好基准。

translated by 谷歌翻译

Benchmarking Counterfactual Algorithms for XAI: From White Box to Black Box

Catarina Moreira , Yu-Liang Chou , Chihcheng Hsieh , Chun Ouyang , Joaquim Jorge , João Madeiras Pereira

分类：机器学习 | 人工智能

2022-03-04

这项研究通过对三种不同类型的模型进行基准评估来调查机器学习模型对产生反事实解释的影响：决策树（完全透明，可解释的，白色盒子模型），随机森林（一种半解释，灰色盒模型）和神经网络（完全不透明的黑盒模型）。我们在五个不同数据集（Compas，成人，德国，德语，糖尿病和乳腺癌）中使用四种算法（DICE，WatchERCF，原型和GrowingSpheresCF）测试了反事实生成过程。我们的发现表明：（1）不同的机器学习模型对反事实解释的产生没有影响；（2）基于接近性损失函数的唯一算法是不可行的，不会提供有意义的解释；（3）在不保证反事实生成过程中的合理性的情况下，人们无法获得有意义的评估结果。如果对当前的最新指标进行评估，则不考虑其内部机制中不合理的算法将导致偏见和不可靠的结论；（4）强烈建议对定性分析（以及定量分析），以确保对反事实解释和偏见的潜在识别进行强有力的分析。

translated by 谷歌翻译

Covered Information Disentanglement: Model Transparency via Unbiased Permutation Importance

João Pereira , Erik S. G. Stroes , Aeilko H. Zwinderman , Evgeni Levin

分类：机器学习 | 人工智能

2021-11-18

模型透明度是许多领域的先决条件和机器学习研究中越来越受欢迎的地区。例如，在医学领域中，揭示疾病背后的机制通常具有比诊断本身更高的优先级，因为它可能决定或引导潜在的治疗和研究方向。解释模型全球预测的最受欢迎方法之一是允许置换数据的性能的置换重要性与基线为基准。然而，这种方法和其他相关方法将低估在协调因子存在中的重要性，因为这些涵盖其提供的信息的一部分。为了解决这个问题，我们提出了涵盖了信息解剖学（CID），一种考虑所有功能信息的方法重叠，以纠正释放重要性提供的值。我们进一步展示了如何在耦合Markov随机字段时有效地计算CID。在受控玩具数据集上首先展示其在调整权释放重要性中的效力，并讨论其对现实世界医疗数据的影响。

translated by 谷歌翻译

Interpretable Models via Pairwise permutations algorithm

Troy Maaslandand , João Pereira , Diogo Bastos , Marcus de Goffau , Max Nieuwdorp , Aeilko H. Zwinderman , Evgeni Levin

分类：机器学习 | (统计)机器学习

2021-11-17

通常在高维生物数据集中发现的最常见的缺陷之一是特征之间的相关性。这可能导致统计和机器学习方法过度或低估这些相关预测因素，而真正相关的则被忽略。在本文中，我们将定义一种名为“成对置换算法}（PPA）的新方法，其目的是在特征重要性值中减轻相关偏差。首先，我们提供了一个理论基础，在以前的工作中建立了折射重要性。然后将PPA应用于玩具数据集，我们展示了校正相关效果的能力。我们进一步测试PPA在微生物霰弹枪数据集上，表明PPA已经能够获得生物相关的生物标志物。

translated by 谷歌翻译

Identifying Latent Stochastic Differential Equations

Ali Hasan , João M. Pereira , Sina Farsiu , Vahid Tarokh

分类： (统计)机器学习 | 机器学习

2020-07-12

我们介绍了一种从高维时间序列数据学习潜在随机微分方程（SDES）的方法。考虑到从较低维潜在未知IT \ ^ O过程产生的高维时间序列，所提出的方法通过自我监督的学习方法学习从环境到潜在空间的映射和潜在的SDE系数。使用变形AutiaceOders的框架，我们考虑基于SDE解决方案的Euler-Maruyama近似的数据的条件生成模型。此外，我们使用最近的结果对潜在变量模型的可识别性来表明，所提出的模型不仅可以恢复底层的SDE系数，还可以在无限数据的极限中恢复底层的SDE系数，也可以最大潜在潜在变量。我们通过多个模拟视频处理任务验证方法，其中底层SDE是已知的，并通过真实的世界数据集。

translated by 谷歌翻译

Generating music with sentiment using Transformer-GANs

Pedro Neves , Jose Fornari , João Florindo

分类：机器学习

2022-12-21

The field of Automatic Music Generation has seen significant progress thanks to the advent of Deep Learning. However, most of these results have been produced by unconditional models, which lack the ability to interact with their users, not allowing them to guide the generative process in meaningful and practical ways. Moreover, synthesizing music that remains coherent across longer timescales while still capturing the local aspects that make it sound ``realistic'' or ``human-like'' is still challenging. This is due to the large computational requirements needed to work with long sequences of data, and also to limitations imposed by the training schemes that are often employed. In this paper, we propose a generative model of symbolic music conditioned by data retrieved from human sentiment. The model is a Transformer-GAN trained with labels that correspond to different configurations of the valence and arousal dimensions that quantitatively represent human affective states. We try to tackle both of the problems above by employing an efficient linear version of Attention and using a Discriminator both as a tool to improve the overall quality of the generated music and its ability to follow the conditioning signals.

translated by 谷歌翻译